Skip to content

feat: Add video frame format render option#1142

Open
xuelongmu wants to merge 1 commit into
heygen-com:mainfrom
xuelongmu:video-frame-format-png-extraction
Open

feat: Add video frame format render option#1142
xuelongmu wants to merge 1 commit into
heygen-com:mainfrom
xuelongmu:video-frame-format-png-extraction

Conversation

@xuelongmu
Copy link
Copy Markdown

@xuelongmu xuelongmu commented May 31, 2026

Summary

Adds a first-class render option for source-video frame extraction:

hyperframes render --video-frame-format <auto|jpg|png>

and the matching programmatic config field:

videoFrameFormat?: "auto" | "jpg" | "png";

The default remains auto, preserving the existing behavior: alpha or alpha-capable video sources extract as PNG, and opaque sources extract as JPG.

Rationale

When compositing screen recordings, for example an iPhone UI recording, the timeline/browser preview can look close to the source while the rendered MP4 shifts saturated UI colors. Changing final H.264 color tags or range does not address this when the color shift has already happened earlier in the pipeline.

Pixel checks showed the shifted color was already present in HyperFrames' captured browser PNG frames. The root cause is that non-alpha source videos were extracted as JPEG frames before browser capture, which can visibly change saturated UI reds before final encoding ever runs. In one representative sample, the original phone recording frame sampled RGB(221,56,46) on a saturated red UI indicator, while HyperFrames' browser PNG frame sampled RGB(186,51,58). A direct ffmpeg source-video composite preserved the same pixel closely at RGB(220,56,50), showing the final H.264 encoder can preserve the color when the source video does not pass through JPEG extraction.

This PR adds an explicit, opt-in PNG extraction path for UI recordings, screen captures, and other color-sensitive source videos. It avoids an RGB lift or color-correction workaround and leaves final encoder defaults unchanged.

What Changed

  • Adds --video-frame-format auto|jpg|png to hyperframes render.
  • Adds RenderConfig.videoFrameFormat.
  • Threads the option through local render, Docker render, producer server render, and distributed planning.
  • Updates resolveFrameFormat so:
    • explicit png extracts opaque videos as PNG
    • explicit jpg extracts opaque videos as JPG
    • auto/undefined preserves existing behavior
    • alpha and alpha-capable codecs still force PNG
  • Ensures extraction cache entries use the effective frame format, so JPG and PNG frame caches cannot collide.
  • Documents the option in CLI and rendering docs.

Validation

  • bun run --filter @hyperframes/engine test -- src/services/videoFrameExtractor.test.ts
  • bun run --filter @hyperframes/cli test -- src/commands/render.test.ts src/utils/dockerRunArgs.test.ts
  • bun run --filter @hyperframes/aws-lambda test -- src/sdk/validateConfig.test.ts
  • bun run --filter @hyperframes/engine typecheck
  • bun run --filter @hyperframes/producer typecheck
  • bun run --filter @hyperframes/cli typecheck
  • bun run --filter @hyperframes/aws-lambda typecheck
  • bunx oxfmt --check
  • bunx oxlint

The new regression test synthesizes a tiny saturated-red UI fixture on a pale pink background and verifies PNG extraction keeps sampled red pixels within a max channel delta of 5 from the decoded source, while also proving JPG and PNG extraction caches remain separate.

@xuelongmu xuelongmu marked this pull request as ready for review June 1, 2026 01:08
@xuelongmu xuelongmu changed the title Add video frame format render option feat: Add video frame format render option Jun 1, 2026
Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve ✅

Solid PR with a real root cause and good evidence. I reproduced the pipeline locally to sanity-check both the diagnosis and the chosen fix, and it holds up.

Verified the root cause

Synthesized a saturated-red UI element (RGB(221,56,46)) encoded as a typical h264/yuv420p screen recording, then extracted a frame the way the engine does (-vf fps -q:v 2):

Extraction Sampled pixel maxΔ from source
JPEG-420 (current) (207,40,48) 16
JPEG-444 (no chroma subsampling) (207,40,47) 16
PNG (this PR) (220,55,45) 1

The shift is not chroma subsampling — JPEG-444 shifts the saturated red just as much as 420. It's intrinsic to JPEG's RGB→YUV→RGB roundtrip on saturated colors. PNG stays in RGB end-to-end, so it's the only thing that actually preserves the pixel. There is no cheaper JPEG-side workaround, which makes PNG the correct fix here.

Why this is the right shape

This is fundamentally a preview/render fidelity gap: Studio preview plays the source <video> directly in Chrome (RGB, no extraction), while render replaces it with pre-extracted frames before capture — so the JPEG shift is baked in before the encoder runs. That's exactly why touching final H.264 color tags can't fix it, as the description notes.

Keeping the default at auto is the right call. I benchmarked PNG-for-everything on a photographic 1080p frame: ~8x slower extraction and ~5x larger (a 60s 1080p30 render goes from ~1.4 GiB to ~6.8 GiB of intermediate frames), and on photographic content the JPEG shift is perceptually invisible. So a blanket PNG default would tax the common case heavily — and on Lambda that frame bloat risks blowing /tmp ephemeral storage outright, not just slowing things down. Opt-in is correct.

Suggested follow-up (not blocking)

The --video-frame-format flag is render-global, but the need is per-source: a composition mixing a UI screen-recording with photographic b-roll has to choose all-or-nothing. A per-<video> hint (e.g. data-frame-format="png", consistent with data-has-audio/data-start) would let authors opt the screen-recording clip into PNG while b-roll stays JPEG — closing the preview/render gap where it matters without the global cost. The flag is a good blunt override to keep alongside it.

Code is clean, threaded consistently through local/Docker/producer/distributed, cache key correctly incorporates the effective format so JPG/PNG caches can't collide, and the regression test is well-targeted. 👍

requested?: VideoFrameFormat,
): CacheFrameFormat {
if (metadata.hasAlpha || codecMayHaveAlpha(metadata.videoCodec)) return "png";
if (requested === "png" || requested === "jpg") return requested;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice — moving the alpha/alpha-capable check ahead of requested isn't just cosmetic, it's a latent bugfix. Under the old order (if (requested) return requested), an explicit jpg would have stripped alpha from a transparent source. Now alpha always wins and jpg/png only override the opaque path, which is the correct precedence.

@miguel-heygen
Copy link
Copy Markdown
Collaborator

@xuelongmu can you rebase main and solve merge conficts? after that feel free to merge it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants